Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: split the output of SRF by max_block_size. #13817

Merged
merged 8 commits into from
Nov 28, 2023

Conversation

RinChanNOWWW
Copy link
Contributor

@RinChanNOWWW RinChanNOWWW commented Nov 27, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Add a new transform type BlockingTransform to block the processing (do not pull new data from the input port) if it doesn't meet the requirement.

We can use BlockingTransform for SRFs to reduce memory usage:

If the output of a block after applying SRF (such as unnest) is too large, we can only generate and output part of the result (by max_block_size). It will not read new input data until the whole result is consumed by the downstream processor.

This improvement makes the processor can release memory immediately when the data is consumed.


This change is Reviewable

@github-actions github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Nov 27, 2023
@sundy-li
Copy link
Member

It's incorrect, we should implement it inside the operator, current impl does not reduce the memory.

@RinChanNOWWW
Copy link
Contributor Author

Ok, I misunderstood the requirement.

@RinChanNOWWW RinChanNOWWW marked this pull request as draft November 27, 2023 07:11
@RinChanNOWWW RinChanNOWWW changed the title chore: split the input block of BlockOperator::FlatMap. chore: split the output of SRF by max_block_size. Nov 27, 2023
@RinChanNOWWW RinChanNOWWW marked this pull request as ready for review November 27, 2023 13:17
@RinChanNOWWW RinChanNOWWW marked this pull request as draft November 28, 2023 01:30
@RinChanNOWWW RinChanNOWWW marked this pull request as ready for review November 28, 2023 05:07
@BohuTANG BohuTANG added the ci-cloud Build docker image for cloud test label Nov 28, 2023
Copy link
Contributor

Docker Image for PR

  • tag: pr-13817-20b7e2a

note: this image tag is only available for internal use,
please check the internal doc for more details.

@BohuTANG
Copy link
Member

Great, large blocks in SRF now is working 👍

@BohuTANG BohuTANG merged commit ed0b275 into databendlabs:main Nov 28, 2023
84 checks passed
@RinChanNOWWW RinChanNOWWW deleted the issue-13815 branch November 29, 2023 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-chore this PR only has small changes that no need to record, like coding styles.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making BlockOperator::FlatMap ouput result by batches by max_block_size
3 participants